~GitHub/inattention-populationsample/code/inattention-populationsample-data-prep.Rmd
This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code. Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Cmd+Shift+Enter. Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Cmd+Option+I. When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Cmd+Shift+K to preview the HTML file).
Inattentive behavior is associated with academic problems. The present study investigates primary school teacher reports on nine items reflecting different aspects of inattention, with an aim to reveal patterns of behavior predicting high-school academic achievement. To that end, we used different types of pattern analysis and machine learning methods.
Inattention in a sample 2397 individuals were rated by their primary school teachers when they participated in the first wave of the Bergen Child Study (BCS) (7 - 9 years old), and their academic achievements were available from an official school register when attending high-school (16 - 19 years old). Inattention was assessed by the nine items rated at a categorical leve, and the academic achievement scores were divided into three parts including a similar number of participants.
Boys obtained higher inattention scores and lower academic scores than girls. Inattention problems related to sustained attention and distractibility turned out to have the highest predictive value of academic achievement level across all selected statistical analyses, and the full model showed that inattention explained about 10% of the variance in high school scores about 10 years later. A high odds-ration of being allocated to the lowest academic achievement category was shown by a multinominal regression analysis, while a pattern of problems related to sustained attention and distractibility was revealed by generating classification trees. By including recursive learning algorithms, the most successful classification was found between these inattention items and the highest level of achievement scores.
The present study showed the importance of a pattern of early problems related to sustained attention and distractibility in predicting future academic results. By including different statistical classification models we showed that this pattern was fairly consistent. Furthermore, calculation of classification errors gave information about the uncertainty when predicting the outcome for individual children. Further studies should include a wider range of variables.
Organization of the data and the analysis:
Libraries being used:
Input file:
Output files (data):
# fn <- "../data2/inattention_Arvid_new.sav"
fn <- "../Dropbox/Arvid_inatteion/data2/inattention_Arvid_new.sav"
# The original SPSS file as provided to AJL is
# 'inattention_Astri_94_96_new_grades_updated.sav'
# and being edited and reduced by AJL to 'inattention_Arvid_new.sav'
# Import data stored in the SPSS format
library(memisc)
Loading required package: lattice
Loading required package: MASS
Attaching package: ‘memisc’
The following objects are masked from ‘package:stats’:
contr.sum, contr.treatment, contrasts
The following object is masked from ‘package:base’:
as.array
# fn <- "../data2/inattention_Arvid_new.sav"
fn <- "/Users/arvid/Dropbox/Arvid_inattention/data2/inattention_Arvid_new.sav"
data <- as.data.set(spss.system.file(fn))
# Make new data frame from the sample with the variables
# gender, grade, SNAP1, ..., SNAP9 (vars #1-11) and
# academic_achievement (var #52)
names(data)
[1] "gender" "grade" "snap1"
[4] "snap2" "snap3" "snap4"
[7] "snap5" "snap6" "snap7"
[10] "snap8" "snap9" "snap10"
[13] "snap11" "snap12" "snap13"
[16] "snap14" "snap15" "snap16"
[19] "snap17" "snap18" "y_4_asrs_1"
[22] "y_4_asrs_2" "y_4_asrs_3" "y_4_asrs_4"
[25] "y_4_asrs_5" "y_4_asrs_6" "y_4_asrs_7"
[28] "y_4_asrs_8" "y_4_asrs_9" "y_4_asrs_10"
[31] "y_4_asrs_11" "y_4_asrs_12" "y_4_asrs_13"
[34] "y_4_asrs_14" "y_4_asrs_15" "y_4_asrs_16"
[37] "y_4_asrs_17" "y_4_asrs_18" "y_4_mfq_1"
[40] "y_4_mfq_2" "y_4_mfq_3" "y_4_mfq_4"
[43] "y_4_mfq_5" "y_4_mfq_6" "y_4_mfq_7"
[46] "y_4_mfq_8" "y_4_mfq_9" "y_4_mfq_10"
[49] "y_4_mfq_11" "y_4_mfq_12" "y_4_mfq_13"
[52] "academic_achievement"
d <- data[, c(1:11, 52)]
dim(d)
[1] 10870 12
names(d)
[1] "gender" "grade" "snap1"
[4] "snap2" "snap3" "snap4"
[7] "snap5" "snap6" "snap7"
[10] "snap8" "snap9" "academic_achievement"
str(d)
Data set with 10870 obs. of 12 variables:
$ gender : Nmnl. item w/ 2 labels for 0,1 num NA NA NA NA NA NA NA NA NA NA ...
$ grade : Itvl. item + ms.v. num NA NA NA NA NA NA NA NA NA NA ...
$ snap1 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num NA NA NA NA NA NA NA NA NA NA ...
$ snap2 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num NA NA NA NA NA NA NA NA NA NA ...
$ snap3 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num NA NA NA NA NA NA NA NA NA NA ...
$ snap4 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num NA NA NA NA NA NA NA NA NA NA ...
$ snap5 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num NA NA NA NA NA NA NA NA NA NA ...
$ snap6 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num NA NA NA NA NA NA NA NA NA NA ...
$ snap7 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num NA NA NA NA NA NA NA NA NA NA ...
$ snap8 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num NA NA NA NA NA NA NA NA NA NA ...
$ snap9 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num NA NA NA NA NA NA NA NA NA NA ...
$ academic_achievement: Itvl. item num 2.86 NA 3 3.67 4.1 ...
summary(d)
gender grade snap1 snap2
Girl:5528 Min. : 2.00 Not true :2646 Not true :2698
Boy :4978 1st Qu. : 2.00 Somewhat true : 350 Somewhat true : 294
* : 0 Median : 3.00 Certainly true: 61 Certainly true: 65
NAs : 364 Mean : 2.84 * : 0 * : 0
3rd Qu. : 3.50 NAs :7813 NAs :7813
Max. : 4.00
Missings: 0.00
NAs :7719.00
snap3 snap4 snap5 snap6
Not true :2810 Not true :2806 Not true :2783 Not true :2784
Somewhat true : 225 Somewhat true : 229 Somewhat true : 225 Somewhat true : 223
Certainly true: 23 Certainly true: 22 Certainly true: 49 Certainly true: 49
* : 0 * : 0 * : 0 * : 0
NAs :7812 NAs :7813 NAs :7813 NAs :7814
snap7 snap8 snap9 academic_achievement
Not true :2927 Not true :2260 Not true :2733 Min. : 1.000
Somewhat true : 96 Somewhat true : 669 Somewhat true : 288 1st Qu. : 3.286
Certainly true: 18 Certainly true: 127 Certainly true: 37 Median : 3.889
* : 0 * : 0 * : 0 Mean : 3.824
NAs :7829 NAs :7814 NAs :7812 3rd Qu. : 4.444
Max. : 6.000
Missings: 0.000
NAs :2204.000
# Get observations of data frame that have missing values and those with complete cases
library(psych)
d.miss <- d[!complete.cases(d),]
d.nomiss <- d[complete.cases(d),]
str(d.nomiss)
Data set with 2397 obs. of 12 variables:
$ gender : Nmnl. item w/ 2 labels for 0,1 num 0 0 0 0 0 0 0 0 0 0 ...
$ grade : Itvl. item + ms.v. num 2 2 2 2 2 2 2 2 2 2 ...
$ snap1 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num 0 0 0 0 0 0 0 0 0 0 ...
$ snap2 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num 0 0 0 0 0 0 0 0 0 0 ...
$ snap3 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num 0 1 0 0 0 0 0 0 0 0 ...
$ snap4 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num 0 0 0 0 0 0 0 0 0 0 ...
$ snap5 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num 0 0 0 0 0 0 0 0 0 0 ...
$ snap6 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num 0 0 0 0 0 0 0 0 0 0 ...
$ snap7 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num 0 0 0 0 0 0 0 0 0 0 ...
$ snap8 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num 0 0 0 0 0 0 0 0 1 0 ...
$ snap9 : Nmnl. item w/ 3 labels for 0,1,2 + ms.v. num 0 0 0 0 0 0 0 0 0 0 ...
$ academic_achievement: Itvl. item num 4.67 3.67 4.14 4.11 4.3 ...
headTail(as.data.frame(d.nomiss))
gender grade snap1 snap2 snap3 snap4 snap5
1 Girl 2 Not true Not true Not true Not true Not true
2 Girl 2 Not true Not true Somewhat true Not true Not true
3 Girl 2 Not true Not true Not true Not true Not true
4 Girl 2 Not true Not true Not true Not true Not true
... <NA> ... <NA> <NA> <NA> <NA> <NA>
2394 Boy 4 Somewhat true Somewhat true Somewhat true Somewhat true Not true
2395 Boy 4 Somewhat true Not true Not true Not true Not true
2396 Boy 4 Somewhat true Somewhat true Not true Not true Not true
2397 Boy 4 Somewhat true Not true Not true Not true Not true
snap6 snap7 snap8 snap9 academic_achievement
1 Not true Not true Not true Not true 4.67
2 Not true Not true Not true Not true 3.67
3 Not true Not true Not true Not true 4.14
4 Not true Not true Not true Not true 4.11
... <NA> <NA> <NA> <NA> ...
2394 Somewhat true Not true Somewhat true Not true 3.88
2395 Not true Not true Not true Not true 3.89
2396 Not true Somewhat true Somewhat true Somewhat true 3.78
2397 Not true Not true Not true Not true 2.56
summary(d.nomiss)
gender grade snap1 snap2
Girl:1256 Min. :2.000 Not true :2079 Not true :2117
Boy :1141 1st Qu.:2.000 Somewhat true : 272 Somewhat true : 230
Median :3.000 Certainly true: 46 Certainly true: 50
Mean :2.814
3rd Qu.:3.000
Max. :4.000
snap3 snap4 snap5 snap6
Not true :2201 Not true :2217 Not true :2190 Not true :2195
Somewhat true : 181 Somewhat true : 164 Somewhat true : 176 Somewhat true : 170
Certainly true: 15 Certainly true: 16 Certainly true: 31 Certainly true: 32
snap7 snap8 snap9 academic_achievement
Not true :2312 Not true :1794 Not true :2142 Min. :1.000
Somewhat true : 73 Somewhat true : 510 Somewhat true : 228 1st Qu.:3.556
Certainly true: 12 Certainly true: 93 Certainly true: 27 Median :4.083
Mean :4.023
3rd Qu.:4.556
Max. :5.900
D1 <- d.nomiss # For later use
summary(D1$snap1[D1$gender == "Boy"])
Not true Somewhat true Certainly true
935 176 30
# Save the nomis D to an .csv file without row names for further analysis
D <- d.nomiss
write.csv(D, file = "../data2/inattention_nomiss_2397x12.csv",row.names=FALSE)
# For simplicity, we rename (and translate) the variables names in the dataset D without any missing
library(plyr)
Attaching package: ‘plyr’
The following object is masked from ‘package:memisc’:
rename
d.nomiss <- read.csv(file = "../data/inattention_nomiss_2397x12.csv")
D <- d.nomiss
D <- rename(D, c(academic_achievement="ave"))
D$ave <- as.numeric(D$ave)
D$snap1 <- mapvalues(as.factor(D$snap1), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","2"))
D$snap1 <- as.numeric(D$snap1)-1
D$snap2 <- mapvalues(as.factor(D$snap2), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","2"))
D$snap2 <- as.numeric(D$snap2)-1
D$snap3 <- mapvalues(as.factor(D$snap3), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","2"))
D$snap3 <- as.numeric(D$snap3)-1
D$snap4 <- mapvalues(as.factor(D$snap4), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","2"))
D$snap4 <- as.numeric(D$snap4)-1
D$snap5 <- mapvalues(as.factor(D$snap5), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","2"))
D$snap5 <- as.numeric(D$snap5)-1
D$snap6 <- mapvalues(as.factor(D$snap6), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","2"))
D$snap6 <- as.numeric(D$snap6)-1
D$snap7 <- mapvalues(as.factor(D$snap7), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","2"))
D$snap7 <- as.numeric(D$snap7)-1
D$snap8 <- mapvalues(as.factor(D$snap8), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","2"))
D$snap8 <- as.numeric(D$snap8)-1
D$snap9 <- mapvalues(as.factor(D$snap9), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","2"))
D$snap9 <- as.numeric(D$snap9)-1
D$gender <- mapvalues(as.factor(D$gender), from = c("Girl", "Boy"), to = c("0", "1"))
D$gender <- as.numeric(D$gender)-1
D$grade <- as.numeric(D$grade)
str(D)
'data.frame': 2397 obs. of 12 variables:
$ gender: num 1 1 1 1 1 1 1 1 1 1 ...
$ grade : num 2 2 2 2 2 2 2 2 2 2 ...
$ snap1 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap2 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap3 : num 1 2 1 1 1 1 1 1 1 1 ...
$ snap4 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap5 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap6 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap7 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap8 : num 1 1 1 1 1 1 1 1 2 1 ...
$ snap9 : num 1 1 1 1 1 1 1 1 1 1 ...
$ ave : num 4.67 3.67 4.14 4.11 4.3 ...
headTail(D)
gender grade snap1 snap2 snap3 snap4 snap5 snap6 snap7 snap8 snap9 ave
1 1 2 1 1 1 1 1 1 1 1 1 4.67
2 1 2 1 1 2 1 1 1 1 1 1 3.67
3 1 2 1 1 1 1 1 1 1 1 1 4.14
4 1 2 1 1 1 1 1 1 1 1 1 4.11
... ... ... ... ... ... ... ... ... ... ... ... ...
2394 0 4 2 2 2 2 1 2 1 2 1 3.88
2395 0 4 2 1 1 1 1 1 1 1 1 3.89
2396 0 4 2 2 1 1 1 1 2 2 2 3.78
2397 0 4 2 1 1 1 1 1 1 1 1 2.56
D3 <- D # For later use
# Save D (at early stage) to an .csv file for later analysis in R or MATLAB
write.csv(D, file = "../data/inattention_nomiss_2397x12_snap_is_0_1_2.csv",row.names=FALSE)
# For even more simplicity, we rename (and translate) the variables names in the dataset
# without any missing, reducing the predictor categories to be binary,
# i.e. collapsing SNAP values "1" and "2" to "1":
library(plyr)
D <- d.nomiss
D <- rename(D, c(academic_achievement="ave"))
D$ave <- as.numeric(D$ave)
D$snap1 <- mapvalues(as.factor(D$snap1), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","1"))
D$snap1 <- as.numeric(D$snap1)-1
D$snap2 <- mapvalues(as.factor(D$snap2), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","1"))
D$snap2 <- as.numeric(D$snap2)-1
D$snap3 <- mapvalues(as.factor(D$snap3), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","1"))
D$snap3 <- as.numeric(D$snap3)-1
D$snap4 <- mapvalues(as.factor(D$snap4), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","1"))
D$snap4 <- as.numeric(D$snap4)-1
D$snap5 <- mapvalues(as.factor(D$snap5), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","1"))
D$snap5 <- as.numeric(D$snap5)-1
D$snap6 <- mapvalues(as.factor(D$snap6), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","1"))
D$snap6 <- as.numeric(D$snap6)-1
D$snap7 <- mapvalues(as.factor(D$snap7), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","1"))
D$snap7 <- as.numeric(D$snap7)-1
D$snap8 <- mapvalues(as.factor(D$snap8), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","1"))
D$snap8 <- as.numeric(D$snap8)-1
D$snap9 <- mapvalues(as.factor(D$snap9), from = c("Not true","Somewhat true","Certainly true"), to = c("0","1","1"))
D$snap9 <- as.numeric(D$snap9)-1
D$gender <- mapvalues(as.factor(D$gender), from = c("Girl", "Boy"), to = c("0", "1"))
D$gender <- as.numeric(D$gender)-1
D$grade <- as.numeric(D$grade)
str(D)
'data.frame': 2397 obs. of 12 variables:
$ gender: num 1 1 1 1 1 1 1 1 1 1 ...
$ grade : num 2 2 2 2 2 2 2 2 2 2 ...
$ snap1 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap2 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap3 : num 1 0 1 1 1 1 1 1 1 1 ...
$ snap4 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap5 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap6 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap7 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap8 : num 1 1 1 1 1 1 1 1 0 1 ...
$ snap9 : num 1 1 1 1 1 1 1 1 1 1 ...
$ ave : num 4.67 3.67 4.14 4.11 4.3 ...
headTail(D)
gender grade snap1 snap2 snap3 snap4 snap5 snap6 snap7 snap8 snap9 ave
1 1 2 1 1 1 1 1 1 1 1 1 4.67
2 1 2 1 1 0 1 1 1 1 1 1 3.67
3 1 2 1 1 1 1 1 1 1 1 1 4.14
4 1 2 1 1 1 1 1 1 1 1 1 4.11
... ... ... ... ... ... ... ... ... ... ... ... ...
2394 0 4 0 0 0 0 1 0 1 0 1 3.88
2395 0 4 0 1 1 1 1 1 1 1 1 3.89
2396 0 4 0 0 1 1 1 1 0 0 0 3.78
2397 0 4 0 1 1 1 1 1 1 1 1 2.56
D2 <- D # For later use
# Save the new D to an .csv file without row names for further analysis
write.csv(D, file = "../data/inattention_nomiss_2397x12_snap_is_0_1.csv",row.names=FALSE)
D <- D3
s <- dim(D)
n <- s[1]
p <- s[2]
txt = sprintf("Structure of the %d x %d DATASET", n, p)
print(txt)
[1] "Structure of the 2397 x 12 DATASET"
library(DiagrammeR)
n_txt = sprintf("Dataset \n (N = %d)", n);
gviz <- grViz("
# Circles: predictor variables; Triangle: Outcome variable
digraph Structure_of_the_dataset_D {
# node definitions with substituted label text
node [fontname = Helvetica]
1 [label = 'Dataset \n (N = 2397)', shape=box]
2 [label = 'gender \n {Girl (0) | Boy (1)}', shape=circle]
3 [label = 'grade \n {2 | 3 | 4}', shape=circle]
4 [label = 'ave \n (average marks) \n [1, 6] or {low (L) | medium (M) | high (H)}', shape=triangle]
a [label = 'SNAP \n {0 | 1 | 2}', shape=oval]
b [label = 'SNAP1', shape=circle]
c [label = 'SNAP2', shape=circle]
d [label = 'SNAP3', shape=circle]
e [label = 'SNAP4', shape=circle]
f [label = 'SNAP5', shape=circle]
g [label = 'SNAP6', shape=circle]
h [label = 'SNAP7', shape=circle]
i [label = 'SNAP8', shape=circle]
j [label = 'SNAP9', shape=circle]
# edge definitions with the node IDs
1 -> {2 3 a 4}
a -> {b c d e f g h i j}
}",
engine = "dot")
print(gviz)
NULL
# This does not work using DiagrammeR / GraphViz
# png("../manuscript/Figs/graph_design.png")
# print(gviz)
# dev.off()
# Uses Viewer, Zoom and Screen capture to produce .png and then
# data_prep_structure_grviz_20160203.pdf file
In our analysis we included n = 2397 individuals (none with missing data) from the dataset “/Users/arvid/Dropbox/Arvid_inattention/data2/inattention_Arvid_new.sav”.
D <- D3
n_txt = sprintf("In our analysis we included n = %d individuals (none with missing data) from the dataset '%s'\n", nrow(D), fn);
print(n_txt)
[1] "In our analysis we included n = 2397 individuals (none with missing data) from the dataset '/Users/arvid/Dropbox/Arvid_inattention/data2/inattention_Arvid_new.sav'\n"
We consider the grades (academic_achievement), as both a continuous (for regression) and discretized variable (for classification), where gjennomsnitt: - Item ‘Karaktergjennomsnitt alle gyldige karakterer 1-6 (ikke kroppsøving)’
# Discretized at three levels, with data-driven cutpoints (equifrequent levels)
D <- D3
aver <- D$ave
summary(aver)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 3.556 4.083 4.023 4.556 5.900
bins <- 3
cutpoints<-quantile(aver,(0:bins)/bins,names=FALSE)
print(cutpoints)
[1] 1.000000 3.750000 4.428571 5.900000
# Consistent with MATLAB 'histcounts' (D_20151110_analysis.m ; T2)
# fn2 = '../data/D_20151110.csv';
# T2 = readtable(fn2);
# bins = 3;
# y = quantile(T2.ave,[0:bins]/bins)
# [N,EDGES,BIN] = histcounts(T2.ave,y);
# cuts = sprintf('1:[%.2f, %.2f) 2:[%.2f,%.2f) 3:[%.2f,%.2f]', EDGES(1), EDGES(2), EDGES(2), EDGES(3), EDGES(3), EDGES(4));
# T2.ave_cat = BIN; % categorical(BIN,'Ordinal',true);
# descr = sprintf('%s - 1:low, 2:medium; 3:high average mark', cuts);
# T2.Properties.VariableDescriptions{'ave_cat'} = descr;
# => descr = 1:[1.00, 3.75) 2:[3.75,4.43) 3:[4.43,5.90] - 1:low, 2:medium; 3:high average mark
averBinned <- cut(aver, cutpoints, right=FALSE, include.lowest=TRUE)
summary(averBinned)
[1,3.75) [3.75,4.43) [4.43,5.9]
779 818 800
Make histogram of dicretized ‘averBinned’:
hist(as.numeric(averBinned))
Define grade categories “low”, “medium” and “high” in terms of the calculated cut-point intervals:
txt_low <- sprintf("low (L): [%.3f, %.3f)\n", cutpoints[[1]], cutpoints[[2]])
print(txt_low)
[1] "low (L): [1.000, 3.750)\n"
txt_medium <- sprintf("medium (M): [%.3f, %.3f)\n", cutpoints[[2]], cutpoints[[3]])
print(txt_medium)
[1] "medium (M): [3.750, 4.429)\n"
txt_high <- sprintf("high H): [%.3f, %.3f]\n", cutpoints[[3]], cutpoints[[4]])
print(txt_high)
[1] "high H): [4.429, 5.900]\n"
library(psych)
# Dataset for classification based on D3 and discretized average academic achievemnt
C <- D3
C$averBinned <- cut(aver, cutpoints, right=FALSE, include.lowest=TRUE,
labels=c("L","M","H"))
C <- subset(C, select = -c(ave))
str(C)
'data.frame': 2397 obs. of 12 variables:
$ gender : num 1 1 1 1 1 1 1 1 1 1 ...
$ grade : num 2 2 2 2 2 2 2 2 2 2 ...
$ snap1 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap2 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap3 : num 1 2 1 1 1 1 1 1 1 1 ...
$ snap4 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap5 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap6 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap7 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap8 : num 1 1 1 1 1 1 1 1 2 1 ...
$ snap9 : num 1 1 1 1 1 1 1 1 1 1 ...
$ averBinned: Factor w/ 3 levels "L","M","H": 3 1 2 2 2 2 1 2 3 2 ...
headTail(as.data.frame(C))
gender grade snap1 snap2 snap3 snap4 snap5 snap6 snap7 snap8 snap9 averBinned
1 1 2 1 1 1 1 1 1 1 1 1 H
2 1 2 1 1 2 1 1 1 1 1 1 L
3 1 2 1 1 1 1 1 1 1 1 1 M
4 1 2 1 1 1 1 1 1 1 1 1 M
... ... ... ... ... ... ... ... ... ... ... ... <NA>
2394 0 4 2 2 2 2 1 2 1 2 1 M
2395 0 4 2 1 1 1 1 1 1 1 1 M
2396 0 4 2 2 1 1 1 1 2 2 2 M
2397 0 4 2 1 1 1 1 1 1 1 1 L
headTail(as.data.frame(D3))
gender grade snap1 snap2 snap3 snap4 snap5 snap6 snap7 snap8 snap9 ave
1 1 2 1 1 1 1 1 1 1 1 1 4.67
2 1 2 1 1 2 1 1 1 1 1 1 3.67
3 1 2 1 1 1 1 1 1 1 1 1 4.14
4 1 2 1 1 1 1 1 1 1 1 1 4.11
... ... ... ... ... ... ... ... ... ... ... ... ...
2394 0 4 2 2 2 2 1 2 1 2 1 3.88
2395 0 4 2 1 1 1 1 1 1 1 1 3.89
2396 0 4 2 2 1 1 1 1 2 2 2 3.78
2397 0 4 2 1 1 1 1 1 1 1 1 2.56
# Save the dataset C with binary SNAP predictors and trinary outcome to an .csv file
# for further analysis
write.csv(C, file = "../data/inattention_nomiss_2397x12_snap_is_0_1_2_outcome_is_L_M_H.csv",row.names=FALSE)
# Dataset for classification based on D3 and discretized average academic achievemnt
E <- D3
E$averBinned <- cut(aver, cutpoints, right=FALSE, include.lowest=TRUE,
labels=c("0","1","2"))
E <- subset(E, select = -c(ave))
str(E)
'data.frame': 2397 obs. of 12 variables:
$ gender : num 1 1 1 1 1 1 1 1 1 1 ...
$ grade : num 2 2 2 2 2 2 2 2 2 2 ...
$ snap1 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap2 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap3 : num 1 2 1 1 1 1 1 1 1 1 ...
$ snap4 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap5 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap6 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap7 : num 1 1 1 1 1 1 1 1 1 1 ...
$ snap8 : num 1 1 1 1 1 1 1 1 2 1 ...
$ snap9 : num 1 1 1 1 1 1 1 1 1 1 ...
$ averBinned: Factor w/ 3 levels "0","1","2": 3 1 2 2 2 2 1 2 3 2 ...
summary(E)
gender grade snap1 snap2 snap3
Min. :0.000 Min. :2.000 Min. :0.000 Min. :0.000 Min. :0.000
1st Qu.:0.000 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000
Median :1.000 Median :3.000 Median :1.000 Median :1.000 Median :1.000
Mean :0.524 Mean :2.814 Mean :1.094 Mean :1.075 Mean :1.069
3rd Qu.:1.000 3rd Qu.:3.000 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000
Max. :1.000 Max. :4.000 Max. :2.000 Max. :2.000 Max. :2.000
snap4 snap5 snap6 snap7 snap8
Min. :0.000 Min. :0.00 Min. :0.000 Min. :0.000 Min. :0.000
1st Qu.:1.000 1st Qu.:1.00 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000
Median :1.000 Median :1.00 Median :1.000 Median :1.000 Median :1.000
Mean :1.062 Mean :1.06 Mean :1.058 Mean :1.025 Mean :1.174
3rd Qu.:1.000 3rd Qu.:1.00 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000
Max. :2.000 Max. :2.00 Max. :2.000 Max. :2.000 Max. :2.000
snap9 averBinned
Min. :0.000 0:779
1st Qu.:1.000 1:818
Median :1.000 2:800
Mean :1.084
3rd Qu.:1.000
Max. :2.000
headTail(as.data.frame(E))
gender grade snap1 snap2 snap3 snap4 snap5 snap6 snap7 snap8 snap9 averBinned
1 1 2 1 1 1 1 1 1 1 1 1 2
2 1 2 1 1 2 1 1 1 1 1 1 0
3 1 2 1 1 1 1 1 1 1 1 1 1
4 1 2 1 1 1 1 1 1 1 1 1 1
... ... ... ... ... ... ... ... ... ... ... ... <NA>
2394 0 4 2 2 2 2 1 2 1 2 1 1
2395 0 4 2 1 1 1 1 1 1 1 1 1
2396 0 4 2 2 1 1 1 1 2 2 2 1
2397 0 4 2 1 1 1 1 1 1 1 1 0
headTail(as.data.frame(D3))
gender grade snap1 snap2 snap3 snap4 snap5 snap6 snap7 snap8 snap9 ave
1 1 2 1 1 1 1 1 1 1 1 1 4.67
2 1 2 1 1 2 1 1 1 1 1 1 3.67
3 1 2 1 1 1 1 1 1 1 1 1 4.14
4 1 2 1 1 1 1 1 1 1 1 1 4.11
... ... ... ... ... ... ... ... ... ... ... ... ...
2394 0 4 2 2 2 2 1 2 1 2 1 3.88
2395 0 4 2 1 1 1 1 1 1 1 1 3.89
2396 0 4 2 2 1 1 1 1 2 2 2 3.78
2397 0 4 2 1 1 1 1 1 1 1 1 2.56
# Save the dataset E with numerical SNAP predictors and trinary outcome to an .csv file
# for further analysis
write.csv(E, file = "../data/inattention_nomiss_2397x12_snap_is_0_1_2_outcome_is_0_1_2.csv",row.names=FALSE)
library(xtable)
C <- as.data.frame(C)
# select columns
cols <- c("gender", "grade", "snap1", "snap2", "snap3", "snap4", "snap5", "snap6", "snap7", "snap8", "snap9", "averBinned")
C[,cols] <- data.frame(apply(C[cols], 2, as.factor))
levels(C$gender) <- c("G", "B")
levels(C$grade) <- c("2nd", "3rd", "4th")
# N - not true (0)
# S - somewhat true (1)
# C - certainly true (2)
levels(C$snap1) <- c("N", "S", "C")
levels(C$snap2) <- c("N", "S", "C")
levels(C$snap3) <- c("N", "S", "C")
levels(C$snap4) <- c("N", "S", "C")
levels(C$snap5) <- c("N", "S", "C")
levels(C$snap6) <- c("N", "S", "C")
levels(C$snap7) <- c("N", "S", "C")
levels(C$snap8) <- c("N", "S", "C")
levels(C$snap9) <- c("N", "S", "C")
levels(C$averBinned) <- c("H", "L", "M") # numerical order = alphabetical order
str(C)
'data.frame': 2397 obs. of 12 variables:
$ gender : Factor w/ 2 levels "G","B": 2 2 2 2 2 2 2 2 2 2 ...
$ grade : Factor w/ 3 levels "2nd","3rd","4th": 1 1 1 1 1 1 1 1 1 1 ...
$ snap1 : Factor w/ 3 levels "N","S","C": 2 2 2 2 2 2 2 2 2 2 ...
$ snap2 : Factor w/ 3 levels "N","S","C": 2 2 2 2 2 2 2 2 2 2 ...
$ snap3 : Factor w/ 3 levels "N","S","C": 2 3 2 2 2 2 2 2 2 2 ...
$ snap4 : Factor w/ 3 levels "N","S","C": 2 2 2 2 2 2 2 2 2 2 ...
$ snap5 : Factor w/ 3 levels "N","S","C": 2 2 2 2 2 2 2 2 2 2 ...
$ snap6 : Factor w/ 3 levels "N","S","C": 2 2 2 2 2 2 2 2 2 2 ...
$ snap7 : Factor w/ 3 levels "N","S","C": 2 2 2 2 2 2 2 2 2 2 ...
$ snap8 : Factor w/ 3 levels "N","S","C": 2 2 2 2 2 2 2 2 3 2 ...
$ snap9 : Factor w/ 3 levels "N","S","C": 2 2 2 2 2 2 2 2 2 2 ...
$ averBinned: Factor w/ 3 levels "H","L","M": 1 2 3 3 3 3 2 3 1 3 ...
headTail(C)
gender grade snap1 snap2 snap3 snap4 snap5 snap6 snap7 snap8 snap9 averBinned
1 B 2nd S S S S S S S S S H
2 B 2nd S S C S S S S S S L
3 B 2nd S S S S S S S S S M
4 B 2nd S S S S S S S S S M
... <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2394 G 4th C C C C S C S C S M
2395 G 4th C S S S S S S S S M
2396 G 4th C C S S S S C C C M
2397 G 4th C S S S S S S S S L
summary(C)
gender grade snap1 snap2 snap3 snap4 snap5 snap6 snap7 snap8
G:1141 2nd:1008 N: 46 N: 50 N: 15 N: 16 N: 31 N: 32 N: 12 N: 93
B:1256 3rd: 827 S:2079 S:2117 S:2201 S:2217 S:2190 S:2195 S:2312 S:1794
4th: 562 C: 272 C: 230 C: 181 C: 164 C: 176 C: 170 C: 73 C: 510
snap9 averBinned
N: 27 H:800
S:2142 L:779
C: 228 M:818
xtable(summary(C))
% latex table generated in R 3.3.1 by xtable 1.8-2 package
% Tue Aug 23 10:29:28 2016
\begin{table}[ht]
\centering
\begin{tabular}{rllllllllllll}
\hline
& gender & grade & snap1 & snap2 & snap3 & snap4 & snap5 & snap6 & snap7 & snap8 & snap9 & averBinned \\
\hline
1 & G:1141 & 2nd:1008 & N: 46 & N: 50 & N: 15 & N: 16 & N: 31 & N: 32 & N: 12 & N: 93 & N: 27 & H:800 \\
2 & B:1256 & 3rd: 827 & S:2079 & S:2117 & S:2201 & S:2217 & S:2190 & S:2195 & S:2312 & S:1794 & S:2142 & L:779 \\
3 & & 4th: 562 & C: 272 & C: 230 & C: 181 & C: 164 & C: 176 & C: 170 & C: 73 & C: 510 & C: 228 & M:818 \\
\hline
\end{tabular}
\end{table}
# Save the dataset C with SNAP predictors as factors and trinary outcome to an .csv file
# for further analysis
write.csv(C, file = "../data/inattention_nomiss_2397x12_snap_is_N_S_C_outcome_is_L_M_H.csv",row.names=FALSE)
library(Hmisc)
Loading required package: survival
Loading required package: Formula
Loading required package: ggplot2
Attaching package: ‘ggplot2’
The following objects are masked from ‘package:psych’:
%+%, alpha
Attaching package: ‘Hmisc’
The following objects are masked from ‘package:xtable’:
label, label<-
The following objects are masked from ‘package:plyr’:
is.discrete, summarize
The following object is masked from ‘package:psych’:
describe
The following objects are masked from ‘package:memisc’:
%nin%, html
The following object is masked from ‘tools:rstudio’:
print.html
The following objects are masked from ‘package:base’:
format.pval, round.POSIXt, trunc.POSIXt, units
describe(C)
C
12 Variables 2397 Observations
----------------------------------------------------------------------------------------------
gender
n missing unique
2397 0 2
G (1141, 48%), B (1256, 52%)
----------------------------------------------------------------------------------------------
grade
n missing unique
2397 0 3
2nd (1008, 42%), 3rd (827, 35%), 4th (562, 23%)
----------------------------------------------------------------------------------------------
snap1
n missing unique
2397 0 3
N (46, 2%), S (2079, 87%), C (272, 11%)
----------------------------------------------------------------------------------------------
snap2
n missing unique
2397 0 3
N (50, 2%), S (2117, 88%), C (230, 10%)
----------------------------------------------------------------------------------------------
snap3
n missing unique
2397 0 3
N (15, 1%), S (2201, 92%), C (181, 8%)
----------------------------------------------------------------------------------------------
snap4
n missing unique
2397 0 3
N (16, 1%), S (2217, 92%), C (164, 7%)
----------------------------------------------------------------------------------------------
snap5
n missing unique
2397 0 3
N (31, 1%), S (2190, 91%), C (176, 7%)
----------------------------------------------------------------------------------------------
snap6
n missing unique
2397 0 3
N (32, 1%), S (2195, 92%), C (170, 7%)
----------------------------------------------------------------------------------------------
snap7
n missing unique
2397 0 3
N (12, 1%), S (2312, 96%), C (73, 3%)
----------------------------------------------------------------------------------------------
snap8
n missing unique
2397 0 3
N (93, 4%), S (1794, 75%), C (510, 21%)
----------------------------------------------------------------------------------------------
snap9
n missing unique
2397 0 3
N (27, 1%), S (2142, 89%), C (228, 10%)
----------------------------------------------------------------------------------------------
averBinned
n missing unique
2397 0 3
H (800, 33%), L (779, 32%), M (818, 34%)
----------------------------------------------------------------------------------------------
library(pander)
panderOptions("digits", 5)
pander(summary(C))
-------------------------------------------------------------------------
gender grade snap1 snap2 snap3 snap4 snap5 snap6 snap7
-------- -------- ------- ------- ------- ------- ------- ------- -------
G:1141 2nd:1008 N: 46 N: 50 N: 15 N: 16 N: 31 N: 32 N: 12
B:1256 3rd: 827 S:2079 S:2117 S:2201 S:2217 S:2190 S:2195 S:2312
NA 4th: 562 C: 272 C: 230 C: 181 C: 164 C: 176 C: 170 C: 73
-------------------------------------------------------------------------
Table: Table continues below
----------------------------
snap8 snap9 averBinned
------- ------- ------------
N: 93 N: 27 H:800
S:1794 S:2142 L:779
C: 510 C: 228 M:818
----------------------------
pander(summary(E))
---------------------------------------------------------------------
gender grade snap1 snap2 snap3
------------- ------------- ------------- ------------- -------------
Min. :0.000 Min. :2.000 Min. :0.000 Min. :0.000 Min. :0.000
1st Qu.:0.000 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000
Median :1.000 Median :3.000 Median :1.000 Median :1.000 Median :1.000
Mean :0.524 Mean :2.814 Mean :1.094 Mean :1.075 Mean :1.069
3rd Qu.:1.000 3rd Qu.:3.000 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000
Max. :1.000 Max. :4.000 Max. :2.000 Max. :2.000 Max. :2.000
---------------------------------------------------------------------
Table: Table continues below
--------------------------------------------------------------------
snap4 snap5 snap6 snap7 snap8
------------- ------------ ------------- ------------- -------------
Min. :0.000 Min. :0.00 Min. :0.000 Min. :0.000 Min. :0.000
1st Qu.:1.000 1st Qu.:1.00 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000
Median :1.000 Median :1.00 Median :1.000 Median :1.000 Median :1.000
Mean :1.062 Mean :1.06 Mean :1.058 Mean :1.025 Mean :1.174
3rd Qu.:1.000 3rd Qu.:1.00 3rd Qu.:1.000 3rd Qu.:1.000 3rd Qu.:1.000
Max. :2.000 Max. :2.00 Max. :2.000 Max. :2.000 Max. :2.000
--------------------------------------------------------------------
Table: Table continues below
--------------------------
snap9 averBinned
------------- ------------
Min. :0.000 0:779
1st Qu.:1.000 1:818
Median :1.000 2:800
Mean :1.084 NA
3rd Qu.:1.000 NA
Max. :2.000 NA
--------------------------
Describe subsets of data according to academic achievement and gender
C.girls.L <- C[ which(C$gender=='G' & C$averBinned=='L'), ]
C.girls.H <- C[ which(C$gender=='G' & C$averBinned=='H'), ]
C.boys.L <- C[ which(C$gender=='B' & C$averBinned=='L'), ]
C.boys.H <- C[ which(C$gender=='B' & C$averBinned=='H'), ]
summary(C.girls.L)
gender grade snap1 snap2 snap3 snap4 snap5 snap6 snap7 snap8 snap9
G:447 2nd:183 N: 23 N: 33 N: 9 N: 9 N: 17 N: 16 N: 5 N: 48 N: 14
B: 0 3rd:163 S:328 S:313 S:361 S:365 S:353 S:351 S:406 S:226 S:355
4th:101 C: 96 C:101 C: 77 C: 73 C: 77 C: 80 C: 36 C:173 C: 78
averBinned
H: 0
L:447
M: 0
summary(C.girls.H)
gender grade snap1 snap2 snap3 snap4 snap5 snap6 snap7 snap8 snap9
G:305 2nd:118 N: 1 N: 3 N: 0 N: 0 N: 5 N: 2 N: 0 N: 7 N: 2
B: 0 3rd:105 S:282 S:286 S:284 S:289 S:284 S:289 S:300 S:245 S:281
4th: 82 C: 22 C: 16 C: 21 C: 16 C: 16 C: 14 C: 5 C: 53 C: 22
averBinned
H:305
L: 0
M: 0
summary(C.boys.L)
gender grade snap1 snap2 snap3 snap4 snap5 snap6 snap7 snap8 snap9
G: 0 2nd:128 N: 6 N: 3 N: 1 N: 0 N: 3 N: 3 N: 3 N: 12 N: 2
B:332 3rd:100 S:284 S:284 S:311 S:304 S:303 S:299 S:321 S:245 S:295
4th:104 C: 42 C: 45 C: 20 C: 28 C: 26 C: 30 C: 8 C: 75 C: 35
averBinned
H: 0
L:332
M: 0
summary(C.boys.H)
gender grade snap1 snap2 snap3 snap4 snap5 snap6 snap7 snap8 snap9
G: 0 2nd:232 N: 2 N: 0 N: 0 N: 0 N: 0 N: 0 N: 0 N: 1 N: 1
B:495 3rd:146 S:474 S:488 S:486 S:491 S:489 S:493 S:493 S:450 S:476
4th:117 C: 19 C: 7 C: 9 C: 4 C: 6 C: 2 C: 2 C: 44 C: 18
averBinned
H:495
L: 0
M: 0
library(Hmisc)
describe(C.girls.L)
C.girls.L
12 Variables 447 Observations
----------------------------------------------------------------------------------------------
gender
n missing unique value
447 0 1 G
----------------------------------------------------------------------------------------------
grade
n missing unique
447 0 3
2nd (183, 41%), 3rd (163, 36%), 4th (101, 23%)
----------------------------------------------------------------------------------------------
snap1
n missing unique
447 0 3
N (23, 5%), S (328, 73%), C (96, 21%)
----------------------------------------------------------------------------------------------
snap2
n missing unique
447 0 3
N (33, 7%), S (313, 70%), C (101, 23%)
----------------------------------------------------------------------------------------------
snap3
n missing unique
447 0 3
N (9, 2%), S (361, 81%), C (77, 17%)
----------------------------------------------------------------------------------------------
snap4
n missing unique
447 0 3
N (9, 2%), S (365, 82%), C (73, 16%)
----------------------------------------------------------------------------------------------
snap5
n missing unique
447 0 3
N (17, 4%), S (353, 79%), C (77, 17%)
----------------------------------------------------------------------------------------------
snap6
n missing unique
447 0 3
N (16, 4%), S (351, 79%), C (80, 18%)
----------------------------------------------------------------------------------------------
snap7
n missing unique
447 0 3
N (5, 1%), S (406, 91%), C (36, 8%)
----------------------------------------------------------------------------------------------
snap8
n missing unique
447 0 3
N (48, 11%), S (226, 51%), C (173, 39%)
----------------------------------------------------------------------------------------------
snap9
n missing unique
447 0 3
N (14, 3%), S (355, 79%), C (78, 17%)
----------------------------------------------------------------------------------------------
averBinned
n missing unique value
447 0 1 L
----------------------------------------------------------------------------------------------
describe(C.girls.H)
C.girls.H
12 Variables 305 Observations
----------------------------------------------------------------------------------------------
gender
n missing unique value
305 0 1 G
----------------------------------------------------------------------------------------------
grade
n missing unique
305 0 3
2nd (118, 39%), 3rd (105, 34%), 4th (82, 27%)
----------------------------------------------------------------------------------------------
snap1
n missing unique
305 0 3
N (1, 0%), S (282, 92%), C (22, 7%)
----------------------------------------------------------------------------------------------
snap2
n missing unique
305 0 3
N (3, 1%), S (286, 94%), C (16, 5%)
----------------------------------------------------------------------------------------------
snap3
n missing unique
305 0 2
S (284, 93%), C (21, 7%)
----------------------------------------------------------------------------------------------
snap4
n missing unique
305 0 2
S (289, 95%), C (16, 5%)
----------------------------------------------------------------------------------------------
snap5
n missing unique
305 0 3
N (5, 2%), S (284, 93%), C (16, 5%)
----------------------------------------------------------------------------------------------
snap6
n missing unique
305 0 3
N (2, 1%), S (289, 95%), C (14, 5%)
----------------------------------------------------------------------------------------------
snap7
n missing unique
305 0 2
S (300, 98%), C (5, 2%)
----------------------------------------------------------------------------------------------
snap8
n missing unique
305 0 3
N (7, 2%), S (245, 80%), C (53, 17%)
----------------------------------------------------------------------------------------------
snap9
n missing unique
305 0 3
N (2, 1%), S (281, 92%), C (22, 7%)
----------------------------------------------------------------------------------------------
averBinned
n missing unique value
305 0 1 H
----------------------------------------------------------------------------------------------
describe(C.boys.L)
C.boys.L
12 Variables 332 Observations
----------------------------------------------------------------------------------------------
gender
n missing unique value
332 0 1 B
----------------------------------------------------------------------------------------------
grade
n missing unique
332 0 3
2nd (128, 39%), 3rd (100, 30%), 4th (104, 31%)
----------------------------------------------------------------------------------------------
snap1
n missing unique
332 0 3
N (6, 2%), S (284, 86%), C (42, 13%)
----------------------------------------------------------------------------------------------
snap2
n missing unique
332 0 3
N (3, 1%), S (284, 86%), C (45, 14%)
----------------------------------------------------------------------------------------------
snap3
n missing unique
332 0 3
N (1, 0%), S (311, 94%), C (20, 6%)
----------------------------------------------------------------------------------------------
snap4
n missing unique
332 0 2
S (304, 92%), C (28, 8%)
----------------------------------------------------------------------------------------------
snap5
n missing unique
332 0 3
N (3, 1%), S (303, 91%), C (26, 8%)
----------------------------------------------------------------------------------------------
snap6
n missing unique
332 0 3
N (3, 1%), S (299, 90%), C (30, 9%)
----------------------------------------------------------------------------------------------
snap7
n missing unique
332 0 3
N (3, 1%), S (321, 97%), C (8, 2%)
----------------------------------------------------------------------------------------------
snap8
n missing unique
332 0 3
N (12, 4%), S (245, 74%), C (75, 23%)
----------------------------------------------------------------------------------------------
snap9
n missing unique
332 0 3
N (2, 1%), S (295, 89%), C (35, 11%)
----------------------------------------------------------------------------------------------
averBinned
n missing unique value
332 0 1 L
----------------------------------------------------------------------------------------------
describe(C.boys.H)
C.boys.H
12 Variables 495 Observations
----------------------------------------------------------------------------------------------
gender
n missing unique value
495 0 1 B
----------------------------------------------------------------------------------------------
grade
n missing unique
495 0 3
2nd (232, 47%), 3rd (146, 29%), 4th (117, 24%)
----------------------------------------------------------------------------------------------
snap1
n missing unique
495 0 3
N (2, 0%), S (474, 96%), C (19, 4%)
----------------------------------------------------------------------------------------------
snap2
n missing unique
495 0 2
S (488, 99%), C (7, 1%)
----------------------------------------------------------------------------------------------
snap3
n missing unique
495 0 2
S (486, 98%), C (9, 2%)
----------------------------------------------------------------------------------------------
snap4
n missing unique
495 0 2
S (491, 99%), C (4, 1%)
----------------------------------------------------------------------------------------------
snap5
n missing unique
495 0 2
S (489, 99%), C (6, 1%)
----------------------------------------------------------------------------------------------
snap6
n missing unique
495 0 2
S (493, 100%), C (2, 0%)
----------------------------------------------------------------------------------------------
snap7
n missing unique
495 0 2
S (493, 100%), C (2, 0%)
----------------------------------------------------------------------------------------------
snap8
n missing unique
495 0 3
N (1, 0%), S (450, 91%), C (44, 9%)
----------------------------------------------------------------------------------------------
snap9
n missing unique
495 0 3
N (1, 0%), S (476, 96%), C (18, 4%)
----------------------------------------------------------------------------------------------
averBinned
n missing unique value
495 0 1 H
----------------------------------------------------------------------------------------------